-
-
Notifications
You must be signed in to change notification settings - Fork 19.3k
DOC: Add floating point precision on writing/reading to csv (#13159) … #62770
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Co-authored-by: William Ayd <[email protected]>
Co-authored-by: William Ayd <[email protected]>
|
|
||
| ``df.to_csv('file.csv', float_format='%.17g')`` allows for floating point precision to be specified when writing to the CSV file. In this example, this ensures that the floating point is written in this exact format of 17 significant digits (64-bit float). | ||
|
|
||
| ``df = pd.read_csv('file.csv', float_precision='round_trip')`` allows for floating point precision to be specified when reading from the CSV file. This is guaranteed to round-trip values after writing to a file and Pandas will read the numbers without losing or changing decimal places. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ``df = pd.read_csv('file.csv', float_precision='round_trip')`` allows for floating point precision to be specified when reading from the CSV file. This is guaranteed to round-trip values after writing to a file and Pandas will read the numbers without losing or changing decimal places. | |
| ``df = pd.read_csv('file.csv', float_precision='round_trip')`` allows for floating point precision to be specified when reading from the CSV file. This is guaranteed to round-trip values after writing to a file and pandas will read the numbers without losing or changing decimal places. |
| Floating Point Precision on Writing and Reading to CSV Files | ||
| +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ | ||
|
|
||
| Floating Point Precision inaccuracies when writing and reading to CSV files happen due to how the numeric data is represented and parsed in pandas. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Floating Point Precision inaccuracies when writing and reading to CSV files happen due to how the numeric data is represented and parsed in pandas. | |
| Floating point precision inaccuracies when writing and reading to CSV files happen due to how the numeric data is represented and parsed in pandas. |
| +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ | ||
|
|
||
| Floating Point Precision inaccuracies when writing and reading to CSV files happen due to how the numeric data is represented and parsed in pandas. | ||
| During the write process, pandas converts all the numeric values into text that is stored as bytes in the CSV file. However, when we read the CSV back, pandas parses those |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| During the write process, pandas converts all the numeric values into text that is stored as bytes in the CSV file. However, when we read the CSV back, pandas parses those | |
| During the write process, pandas converts all the numeric values into text that is stored as bytes in the CSV file. However, when the CSV is read back, pandas parses those |
| Floating Point Precision on Writing and Reading to CSV Files | ||
| +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ | ||
|
|
||
| Floating Point Precision inaccuracies when writing and reading to CSV files happen due to how the numeric data is represented and parsed in pandas. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the content in its current state talks about implementation details and is a bit harsh on pandas, to the extent that I think its missing the larger point that floating point values are by nature not exact.
Taking a step back - what is the overall goal that this documentation is trying to achieve?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Will, thank you for the feedback. The overall goal is to explain that, by default, due to computer arithmetic outside of our control, floating point numbers are not always stored or returned with exact accuracy.
My intent with the doc addition I added is to show that floating point numbers cannot always be stored precisely, and differences can arise when values are converted and later read back. However, to help with this, pandas provides options such as the float_format parameter (for writing) and the float_precision="round_trip" parameter (for reading) that help improve precision when writing and reading to csv. So that they are preserved just as the were and precision loss doesn't happen.
Added section for Floating Point Precision on Writing and Reading to CSV Files to address issue #13159 with detailed explanation of why the precision loss happens as well as a code example demonstrating the solution.
doc/source/user_guide/io.rstfile if fixing a bug or adding a new feature.